Scalable Social Analytics for Live Viral Event Prediction
نویسندگان
چکیده
Large-scale, predictive social analytics have proven effective. Over the last decade, research and industrial efforts have understood the potential value of inferences based on online behavior analysis, sentiment mining, influence analysis, epidemic spread, etc. The majority of these efforts, however, are not yet designed with realtime responsiveness as a first-order requirement. Typical systems perform a post-mortem analysis on volumes of historical data and validate their “predictions” against already-occurred events. We observe that in many applications, real-time predictions are critical and delays of hours (and even minutes) can reduce their utility. As examples: political campaigns could react very quickly to a scandal spreading on Facebook; content distribution networks (CDNs) could prefetch videos that are predicted to soon go viral; online advertisement campaigns can be corrected to enhance consumer reception. This paper proposes CrowdCast, a cloud-based framework to enable real-time analysis and prediction from streaming social data. As an instantiation of this framework, we tune CrowdCast to observe Twitter tweets, and predict which YouTube videos are most likely to “go viral” in the near future. To this end, CrowdCast first applies online machine learning to map natural language tweets to a specific YouTube video. Then, tweets that indeed refer to videos are weighted by the perceived “influence” of the sender. Finally, the video’s spread is predicted through a sociological model, derived from the emerging structure of the graph over which the video-related tweets are (still) spreading. Combining metrics of influence and live structure, CrowdCast outputs sets of candidate videos, identified as likely to become viral in the next few hours. We monitor Twitter for more than 30 days, and find that CrowdCast’s real-time predictions demonstrate encouraging correlation with actual YouTube viewership in the near future.
منابع مشابه
Hybrid Method of Logistic Regression and Data Envelopment Analysis for Event Prediction: A Case Study (Stroke Disease)
Abstract Predictive analytics is an area of statistics that deals with extracting information from data and using it to predict trends and behavior patterns. Many mathematical modeling has been developed and used for prediction, and in some cases, they have been found to be very strong and reliable. This paper studies different mathematical and statistical approaches for events prediction. The ...
متن کاملI-SI: Scalable Architecture for Analyzing Latent Topical-Level Information From Social Media Data
We present a general visual analytics architecture that is constructed and implemented to effectively analyze unstructured social media data on a large scale. Pipelined based on a high-performance cluster configuration, MPI processing, and interactive visual analytics interfaces, our architecture, I-SI, closely integrates data-driven analytical methods and user-centered visual analytics. It cre...
متن کاملSocial Media Predictive Analytics
The recent explosion of social media services like Twitter, Google+ and Facebook has led to an interest in social media predictive analytics – automatically inferring hidden information from the large amounts of freely available content. It has a number of applications, including: online targeted advertising, personalized marketing, large-scale passive polling and real-time live polling, person...
متن کاملMapping Temporal Horizons
Microblogging platforms such as Twitter have recently received much attention as great sources for Live Web sensing, for real time event detection or opinion analysis. Previous works usually assumed that the tweets mainly describe “what’s happening now”. However, a large portion of tweets actually refers to time frames within the past or the future. Such messages often reflect expectations or m...
متن کاملDistributed Semantic Analytics Using the SANSA Stack
A major research challenge is to perform scalable analysis of largescale knowledge graphs to facilitate applications like link prediction, knowledge base completion and reasoning. Analytics methods which exploit expressive structures usually do not scale well to very large knowledge bases, and most analytics approaches which do scale horizontally (i.e., can be executed in a distributed environm...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014